37  Kruskal-Wallis Test

37.1 Kruskal-Wallis Test

The Kruskal-Wallis test is a non-parametric statistical test that is used to determine if there are statistically significant differences between the medians of three or more independent groups. It is an extension of the Mann-Whitney U test and is particularly useful when the assumptions of one-way ANOVA (such as normality) cannot be met.

37.1.1 Assumptions

The Kruskal-Wallis test relies on the following assumptions:

  1. Independence of Samples: The groups are independent of one another.
  2. Ordinal or Continuous Data: The data within and across groups should be ordinal or continuous.
  3. Similarity of Shape: The distributions of the groups should be similar, allowing the medians to be comparable.

37.1.2 Hypotheses

The hypotheses for the Kruskal-Wallis test are as follows:

  • Null Hypothesis (H₀): The medians of all groups are equal.
  • Alternative Hypothesis (H₁): At least one group’s median is different from the others.

37.1.3 Formula

The test statistic (H) is calculated as follows: \[ H = \left(\frac{12}{n(n+1)}\right) \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1) \] Where:

  • \(n\) is the total number of observations.
  • \(k\) is the number of groups.
  • \(R_i\) is the sum of ranks in the \(i^{th}\) group.
  • \(n_i\) is the number of observations in the \(i^{th}\) group.

37.1.4 Calculation Steps

  1. Rank all data from all groups together; the lowest value gets rank 1, the next lowest rank 2, and so on.
  2. Calculate the sum of ranks for each group.
  3. Use the formula to calculate the H statistic.

37.1.5 Interpretation

A large value of H indicates a rejection of the null hypothesis. This value is compared against a chi-square distribution with \(k-1\) degrees of freedom. If the calculated H is greater than the critical value from the chi-square table at the desired level of significance, the null hypothesis is rejected.

37.1.6 Example Problem

Let’s consider an example where a researcher wants to compare the effectiveness of four different medications. The response scores from patients are as follows:

  • Medication A: 67, 75, 74, 70
  • Medication B: 70, 65, 76, 68
  • Medication C: 82, 85, 87, 83
  • Medication D: 60, 59, 61, 65

Hypotheses:

  • Null Hypothesis (H₀): The median response scores for all four medications are the same.
  • Alternative Hypothesis (H₁): At least one medication’s median response score is different from the others.

37.1.7 Kruskal-Wallis Test using Excel:

Download the Excel file link here

37.1.8 Kruskal-Wallis Test using R:

Code
R
# Data for the medications
med_a <- c(67, 75, 74, 70)
med_b <- c(70, 65, 76, 68)
med_c <- c(82, 85, 87, 83)
med_d <- c(60, 59, 61, 65)

# Combine into a list
data <- list(Medication_A = med_a, Medication_B = med_b, Medication_C = med_c, Medication_D = med_d)

# Perform Kruskal-Wallis test
kw_test <- kruskal.test(data)

# Print the results
print(kw_test)

    Kruskal-Wallis rank sum test

data:  data
Kruskal-Wallis chi-squared = 12.55, df = 3, p-value = 0.005719

37.1.9 Kruskal-Wallis Test using Python:

Code
Python
from scipy.stats import kruskal

# Data for the medications
med_a = [67, 75, 74, 70]
med_b = [70, 65, 76, 68]
med_c = [82, 85, 87, 83]
med_d = [60, 59, 61, 65]

# Perform Kruskal-Wallis test
statistic, p_value = kruskal(med_a, med_b, med_c, med_d)

# Print the results
print("Kruskal-Wallis statistic:", statistic, "P-value:", p_value)
Kruskal-Wallis statistic: 12.54977876106195 P-value: 0.005718662446349043

This method allows for a robust analysis of variance when the data is not suited to traditional ANOVA, providing valuable insights in fields such as medicine, psychology, and ecological research.